On the Use of Prosodic Labelling in Corpus-Based Linguistic Studies of Spontaneous Speech

نویسندگان

  • Daniela Braga
  • Diamantino Freitas
  • João Paulo Ramos Teixeira
  • Aldina Marques
چکیده

This paper addresses the construction of a spontaneous speech corpus in European Portuguese (hereafter EP), the corpus is presented and a prosodic labeling scheme that is here proposed is explained. The objective of this work is to provide a tool for linguistic analysis suitable to several research topics, which have speech and dialogue as objects. The main features considered in the database will be described and justified. Methodological problems and some observed prosodic and pragmatic related phenomena deriving from the labeling of the speech signal are also presented. A discussion is done about some applications on pragmatic studies, speech synthesis and prosodic phonology. Our purpose is to make this work available to scientific community, since there isn’t any other database of this kind available and informatically accessible for EP. Future perspectives of the on-going work are also previewed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation

In the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and researc...

متن کامل

M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases

In automatic speech understanding, division of continuous running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistical models for prosodic boundaries large databases are necessary. For the German Verbmobil (VM) project (automatic speech-to-speech translation), we developed a syntactic±prosodic labelling scheme ...

متن کامل

CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech

This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of ...

متن کامل

Acoustic and Linguistic Information Based Chinese Prosodic Boundary Labelling

The paper analyzes both acoustic and linguistic features with different Chinese prosodic boundaries. Then a rule-learning approach was used to do the prosodic boundary labelling. In the paper the prosodic boundaries are classified into four levels, full intonational boundary with strong intonational marking with/without lengthening or change in speech tempo, prosodic phrase boundary with rather...

متن کامل

The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages

The C-ORAL-ROM project has delivered a multilingual corpus of spontaneous speech for the main romance languages (Italian, French, Portuguese and Spanish). The collection aims to represent the variety of speech acts performed in everyday language and to enable the description of prosodic and syntactic structures in the four romance languages. Sampling criteria are defined in a corpus design sche...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003